- Q16.(3 marks) a) What are different types of misses in cache memory? Briefly explain.
- b)(3 marks) It is a usual convention to use higher order bits of the address as tag bits and middle bits are used for indexing (<Tag, Index, offset>) in the set-associative mapping of cache. What problem will arise if higher order bits are used for indexing?
- c) (3 marks)For a 32KB direct mapped cache with 128 byte cache block, calculate the number of sets. Find the address of the starting byte of the first word in the block that contains the address 0xA21967EF.
- **Q17.(5 marks)** The instruction pipeline of a RISC processor has the following stages: Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO) and Write-back (WB), The IF, ID, OF and WB stages take 1 clock cycle each for every instruction. Consider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 clock cycles each, 35 instructions take 2 clock cycles each, and the remaining 25 instructions take 1 clock cycle each. Assume that there are no data hazards and no control hazards. What is the number of cycles required to complete the execution sequence?

or

Q18.(5 marks) Consider a processor with a delayed branch that has three delay slots. Two compilers -- compiler A and compiler B, could run on this processor. Compiler A can fill the first delay slot 60% of the time and the second delay slot 40% of the time and the third delay slot 20% of the time. Compiler B can fully fill all three delay slots. Assuming that branches account for 20% of all instructions and arithmetic/logic operations for the remaining 80% of the instructions for any program, what is the improvement of CPI with compiler B compared to CPI with compiler A?

## Q19.(4 marks each) Differentiate between followings(logical points only):

- (a) RISC vs CISC
- (b) Loosely coupled and Tightly coupled Multiprocessor.
- (c) I/O mapped I/O and Memory Mapped I/O
- **Q20.(6 marks)** Explain the principle of pipelining with the help of space time diagram. Derive an expression for speed up. Explain types of hazards with proper example.
- **Q21.(4 marks)**(a) Consider a system, that uses Interrupt Driven I/O for a particular device which has a data transfer rate of 10 KBPS . The processing of the interrupt takes 250 microseconds . What percentage of CPU time is consumed by I/O device, if I/O device interrupts for every 2 bytes of data transfer?
- (b)(3 marks)Enumerate the steps that are undertaken on arrival and acknowledgement of an interrupt?
- (c)(3 marks) Why is Stack data structure so important for processing of interrupts?
- (d)(3 marks) In what way does an interrupt differ from a function call?
- (e)(3 marks) What role does the CPU play in a DMA based data transfer system?
- **Q22. (4 marks each)**Consider a CPU that is connected to a pressure sensing system of an oil bath, and a temperature sensing system for the same bath. How would you design the system so that if the temperature or pressure goes beyond the respective specified ranges, a corresponding alarm will be activated. Otherwise, the processor will continually record the log of the temperatures and pressures, every 5 msec.
- i) Can you provide a block diagram for your solution?
- ii) Can you provide a high level description (pseudo-code) of the program that the CPU will execute?

Marks: Roll Number

## **Shiv Nadar Institute of Eminance**

CSD211:Computer Organization and Architecture
End-Term Examination

Duration: 3 Hour Set No. C M.M.: 100 Instructions:

- The question paper contains three parts. Answer all questions in Part-A , Part-B and Part-C .
- Part-A and Part-B need to be filled in the guestion paper only.
- Attempt carefully, cutting and erasing are not allowed in Part-A and Part-B.
- Written sheets will be provided for Part-C only. Unnecessary lengthy answers will be penlized
- Use of scientific calculators is permitted.
- Q7 to Q9 are MCQs having only one correct answer. Mark correct answer only.
- Options are provided between: Q12 vs Q13, and Q17 vs Q18.
- All the answers should be brief and to the point. Unnecessary lengthy answers will be penlized.
- No clarification will be entertained during the examination. You may mention the assumptions while answering the questions, if required.

## Part-A(each of 2 marks X 10= 20 Marks)

| Q1. The main memory of a computer has 2 cm blocks while the cache has 2C blocks. If the cache uses the set associative mapping scheme with 2 blocks per set, then block K. The main memory maps to the set.  ( ) (Kmod m) of the cache ( ) (Kmod c) of the cache ( ) (K mod 2c) of the cache                                     |  |  |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| <b>Q2.</b> A memory system has a cache, a main memory and a disk. If the hit rate in the cache is 98% and the hit rate in the main memory in 99% .What in the average memory access time if it takes 2 cycles to access the cache, 150 cycles to fetch a line from main memory and 100,000 cycle to access the secondary memory. |  |  |  |  |  |
| Ans:                                                                                                                                                                                                                                                                                                                             |  |  |  |  |  |
| <b>Q3</b> . Consider a 5 stage instruction pipeline with latencies (in ns) 2,4,6,7 and 9 respectively. Find the average CPI of Non-pipeline CPU when speed up achieved with respect to pipeline is 2 (assume ideal case for peipelining). <b>Ans</b>                                                                             |  |  |  |  |  |
| <b>Q4.</b> True/ False $(0.5 \times 4 = 2 \text{ marks})$                                                                                                                                                                                                                                                                        |  |  |  |  |  |
| a. High associativity in a cache reduces compulsory misses. ( ) b. Both DRAM and SRAM must be refreshed periodically. ( ) c. Forwarding can present all data depending pipeline hazard. ( )                                                                                                                                      |  |  |  |  |  |

**Q5**. Three processor making companies C1, C2 and C3 release modifications to the three major components (X,Y and Z) of their processors to improve the execution time. The fraction of the total execution time utilized by a running application on X, Y and Z are 25%, 55% and20%, respectively. The speedup gained for the components X, Y and Z corresponding the individual companies are outlined below. Calculate the speedup obtained for the processors belonging to each individual companies. Rank the companies in terms of the speedup of their processors.

d. The performance of a pipelined processor or suffers of the pipeline stages share

| Proccesor | X   | Y   | Z   |
|-----------|-----|-----|-----|
| C1        | 1.2 | 0.3 | 1.6 |
| C2        | 0.8 | 1.5 | 2.1 |
| C3        | 1.8 | 0.6 | 1.2 |

hardware resources.

Ans.

| 27. 2-way set associative cache memory with a capacity of 16KB is built using a block size of words. The word length is 32-bits. The size of the physical address space is 2GB. Find th number of bits for the tag field. Consider memory word-addressable.  29. A given application written in java runs in 18 sec on a desktop processor. A new jave compiler is released that requires only 0.4 as many instructions as the old compile Juffortunately, it increases the CPI by 1.6. How fast can we expect the application to run usin his new compiler?  20. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. () OD24 () OD4D () 4D0D () 4D3D.  20. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. () OD24 () OD4D () 4D0D () 4D3D.  210. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ) 0A20 () 1134 () 49D0 () 4AE8  211.(8 marks)Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size in 256 | rorm?                                                                                              |                                                                                 |                                                     |                                                                                                     |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|-----------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| words. The word length is 32-bits. The size of the physical address space is 2GB. Find th number of bits for the tag field. Consider memory word-addressable.  Ans  27. A given application written in java runs in 18 sec on a desktop processor. A new java compiler is released that requires only 0.4 as many instructions as the old compile Unfortunately, it increases the CPI by 1.6. How fast can we expect the application to run usin his new compiler?  Ans  Consider the following floating point format mantissa is a pure fraction in sign magnitude form.  Sign Bit(15th bit) Excess of 64 Exponent (14th bit-8th bit) Mantissa(7th bit-oth bit)  Q9. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  Q10. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ( ) 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B(20 Marks)  Q11.(8 marks)Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size associative cache, and A direct mapped cache The cache size in 256 bytes The cac          | Ans                                                                                                |                                                                                 |                                                     |                                                                                                     |
| 27. A given application written in java runs in 18 sec on a desktop processor. A new jave compiler is released that requires only 0.4 as many instructions as the old compile compiler is released that requires only 0.4 as many instructions as the old compile in protrunately, it increases the CPI by 1.6. How fast can we expect the application to run using this new compiler?  Ans  Consider the following floating point format mantissa is a pure fraction in sign magnitude form.  Sign Bitusth biti) Excess of 64 Exponent (14th bit-8th bit) Mantissa(7th bit-0th bit)  29. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  2010. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is 10 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B(20 Marks)  2011.(8 marks)Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size is 3 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b→4096 - 4099 1-4104 - 4107, c→4100 - 4103, r→4108 - 4111, i→4112 - 4115. Assume cold cache & LR replacement  A). for(i=0:i<16:i++) { b+ = q*a[i]; b+ = q*a[64*i]; b+ = q*a[64*i]; } for(i=0:i<16:i++) { c+ = r*a[64*i]; } c+ = r*a[64*i]; } hence is constitute cache in scenario (A).  Ans:  D. How many data cache read, compulsory misses, capacity misses and conflict misses will occur in fully associative cache in scenario (B).                                                                                                                                                                                                                               | words. The word leng                                                                               | th is 32-bits. The size o                                                       | f the physical                                      | address space is 2GB. Find the                                                                      |
| compiler is released that requires only 0.4 as many instructions as the old compile unfortunately, it increases the CPI by 1.6. How fast can we expect the application to run usin his new compiler?  Ans                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | \ns                                                                                                |                                                                                 |                                                     |                                                                                                     |
| Consider the following floating point format mantissa is a pure fraction in sign nagnitude form.  Sign Bit(15th bit) Excess of 64 Exponent (14th bit-8th bit) Mantissa(7th bit-0th bit)  Q9. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  Q10. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ) 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B( 20 Marks)  Q11.(8 marks) Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size is 8 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b→4096 - 4099 1-4104 - 4107, c→4100 - 4103, r→4108 - 4111, i→4112 - 4115. Assume cold cache & LR eplacement  A). for(i=0:i<16:i++) {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | compiler is released<br>Infortunately, it increa                                                   | that requires only 0.4                                                          | as many ins                                         | structions as the old compile                                                                       |
| Sign Bit(15th bit) Excess of 64 Exponent (14th bit-8th bit) Mantissa(7th bit-0th bit)  Q9. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  Q10. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ) 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B(20 Marks)  Q11.(8 marks)Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size is 8 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b→4096 - 4099 1→4104 - 4107, c→4100 - 4103, r→4108 - 4111, i→4112 - 4115. Assume cold cache & LR replacement  A). for(i=0:i<16:i++) (B). for(i=0:i<16:i++) {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | Ans                                                                                                |                                                                                 |                                                     |                                                                                                     |
| Sign Bit(15th bit) Excess of 64 Exponent (14th bit-8th bit) Mantissa(7th bit-0th bit)  Q9. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  Q10. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ) 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B( 20 Marks)  Q11.(8 marks)Assume that we have three scenario : A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size is 3 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b→4096 - 4099 q→4104 - 4107, c→4100 - 4103 , r→4108 - 4111, i→4112 - 4115. Assume cold cache & LR eplacement  A). for(i=0:i<16:i++) (B). for(i=0:i<16:i++) {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                    | ing floating point forma                                                        | at mantissa is                                      | a pure fraction in sign                                                                             |
| Q9. The decimal number 0.239 x 2 <sup>13</sup> has the following hexadecimal representation (without normalization) and rounding off. ( ) OD24 ( ) OD4D ( ) 4D0D ( ) 4D3D.  Q10. The normalised representation for the above format is specified as follows. The mantisa has an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while shifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is ( ) 0A20 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B(20 Marks)  Q11.(8 marks)Assume that we have three scenario: A fully associative cache, A two way se associative cache, and A direct mapped cache The cache size in 256 bytes The cache line size is 3 bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b→4096 - 4099 q→4104 - 4107, c→4100 - 4103, r→4108 - 4111, i→4112 - 4115. Assume cold cache & LR replacement  (A). for(i=0:i<16:i++)  {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                                    | Excess of 64 Exponent                                                           | (14th bit-8th bit)                                  | Mantissa(7th bit-0th bit)                                                                           |
| A). for(i=0:i<16:i++)  (b). for(i=0:i<16:i++)  (c). a field. The qualitation and data caches assume that a cache read, compulsory misses, capacity misses and conflict misses will occur in fully associative cache in scenario (A).  (a). OD24 (b). OD25 (c). Apply the provided in the cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                                    |                                                                                 | (2 1 2 2 2                                          |                                                                                                     |
| nas an implicit 1 preceding the binary (radix) point. Assume that only 0's are padded in while thifting a field. The normalised representation of the above number 0.239 x 2 <sup>13</sup> is 0.0420 ( ) 1134 ( ) 49D0 ( ) 4AE8  Part-B( 20 Marks)  Part-                     |                                                                                                    |                                                                                 |                                                     |                                                                                                     |
| <b>Q11.(8 marks)</b> Assume that we have three scenario: A fully associative cache, A two way sets associative cache, and A direct mapped cache The cache size in 256 bytes. The cache line size is bytes. All variables are 4 bytes. Assume that we have separate instruction and data caches assume that a(1024) is assigned to memory location byte 0 through 4095. b $\rightarrow$ 4096 - 4099 $\rightarrow$ 4104 - 4107, c $\rightarrow$ 4100 - 4103, r $\rightarrow$ 4108 - 4111, i $\rightarrow$ 4112 - 4115. Assume cold cache & LR eplacement  A). for(i=0:i<16:i++)  {  b+=q*a[i];  c+=r*a[i];  c+=r*a[i];  h. How many data cache read, compulsory misses, capacity misses and conflict misses will occur in fully associative cache in scenario (A).  Ans:  b. How many data cache read misses will occur in fully associative cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | has an implicit 1 prece<br>hifting a field. The nor                                                | ding the binary (radix) poi<br>malised representation of                        | int. Assume tha<br>the above num                    | t only 0's are padded in while<br>ober 0.239 x 2 <sup>13</sup> is                                   |
| A). for(i=0:i<16:i++) {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                                                    | Part-B( 2                                                                       | 0 Marks)                                            |                                                                                                     |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | associative cache, and<br>3 bytes. All variables a<br>Assume that a(1024) i<br>q→4104 - 4107, c→41 | A direct mapped cache T<br>are 4 bytes. Assume that<br>is assigned to memory lo | he cache size in<br>we have separ<br>ocation byte 0 | n 256 bytes The cache line size i<br>ate instruction and data caches<br>through 4095. b→4096 – 4099 |
| for(i=0:i<16:i++) {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | A). for(i=0:i<16:i++)                                                                              |                                                                                 | (B). for(i=0:i                                      | <16:i++)                                                                                            |
| {                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | t<br>b+                                                                                            | = q*a[i];                                                                       | 1                                                   | b+ = q*a[64*i];                                                                                     |
| c+ = r*a[i]; c+ = r*a[64*i]; }. How many data cache read, compulsory misses, capacity misses and conflict misses will occur fully associative cache in scenario (A).  Ins:  How many data cache read misses will occur in fully associative cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | •                                                                                                  | -)                                                                              | for(i=                                              | ſ                                                                                                   |
| n fully associative cache in scenario (A).  Ans:  . How many data cache read misses will occur in fully associative cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | C+                                                                                                 | = r*a[i] ;                                                                      |                                                     | c+ = r*a[64*i];                                                                                     |
| . How many data cache read misses will occur in fully associative cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | n fully associative cacl                                                                           |                                                                                 | es, capacity mis                                    | ses and conflict misses will occur                                                                  |
| 4113.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | o. How many data cach                                                                              | ne read misses will occur i                                                     | n fully associat                                    | ive cache in scenario (B).                                                                          |
| c. How many data cache read misses will occur in direct mapping cache in scenario (B).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                                    | <br>ne read misses will occur i                                                 | n direct mappin                                     | g cache in scenario (B).                                                                            |

d. How many data cache read misses will occur in 2 way set associative cache in scenario (B).

Ans:

**Q6.** The following bit pattern represents a floating point number in IEEE 754 single precision

Q12. (8 marks) Consider the following sequences of branch outcomes. T means the branch is taken; N means the branch is not taken.

| Sequence 1: T, T, T, N                                | Sequence 2: T, N, T, T, N                     |  |  |  |
|-------------------------------------------------------|-----------------------------------------------|--|--|--|
| a) What is the accuracy of the always-taken branc     | h predictor for each sequence?()              |  |  |  |
| b) What is the accuracy of the always-not-taken br    | ranch predictor for each sequence?()          |  |  |  |
| c) What is the accuracy of the 2-bit branch predict   | or for each sequence? Assume the predictor    |  |  |  |
| starts in the "weak" not taken state (lower-right).(. | )                                             |  |  |  |
| d) What is the accuracy of the 2-bit branch predict   | or for each sequence, if the sequence repeats |  |  |  |
| thousands of times?(                                  | )                                             |  |  |  |
| or                                                    |                                               |  |  |  |

Q13.(8 marks) We have a program core consisting of five conditional branches. The program core will be executed thousands of times. Below are the outcomes of each branch for one execution of the program core (T for taken, N for not taken).

**Branch 1: T-T-T Branch 2: N-N-N-N** Branch 3: N-T-N-T-N-T Branch 4: T-N-T-N-T-N **Branch 5: T-T-N-T-T** 

Assume the behavior of each branch remains the same for each program core execution. For the dynamic branch prediction schemes below, assume that each branch maps to a unique entry in the branch history table. Further assume that each BHT entry is initialized to the same stage before each execution: (a) Always taken (b) Always not taken (c) 1-bit predictor, initialized to predict taken (d) 2-bit predictor, initialized to weakly predict taken

What is the branch prediction accuracy for branch 1 to 5 with prediction scheme (a) through (d)? Use the following table to fill out your final answer.

| Branches | Prediction schemes |     |     |     |  |
|----------|--------------------|-----|-----|-----|--|
|          | (a)                | (b) | (c) | (d) |  |
| 1        |                    |     |     |     |  |
| 2        |                    |     |     |     |  |
| 3        |                    |     |     |     |  |
| 4        |                    |     |     |     |  |
| 5        |                    |     |     |     |  |

Q14.(4 marks) Consider a processor with 64 registers and an instruction set of 12 instructions. Each instruction has 5 distinct fields namely opcode, two source register identifier, one destination register identifier and a 12 bit immediate value. Each instructor must be stored in memory in byte aligned fashion. If the program has 100 instructions, the amount of memory (in bytes) consumed by the program text is what?

| Ans: |  |  |
|------|--|--|

## Part-C(need to be written on answer sheet)(60 Marks)

Q15.(4 marks) Consider the unpipelined processor. Assume that it has a 1 nsec clock cycle and that it uses 4 cycles for ALU operations and branches, and 5 cycles for memory operations. Assume that the relative frequencies of these operations are 40%, 20% and 40 % respectively. Suppose that due to clock skew and setup, pipelining the processor adds 0.2 nsec of overhead in clock. How much speedup, we will gain from pipeline?.